accuracy measure

The glossary is being gradually proof checked, but currently has many typos and misspellings.

Although we use the term accuracy, in day-to-day speech there are many different kinds of accuracy measures depending on the kind of data and application. Often the different measures are contradictory, getting the best accuracy on one metric means sacrificing accuracy on another, including the precision–recall trade-off.

For numeric data the most common measure is root mean square (RMS) in part because it has nice statistical properties, for example linear regression is about finding the line through data that minimises RMS. RMS is affected particularly strongly by small numbers of extreme values, so average absolute difference may be used instead. If we are interested in worst case scenarios, the maximum difference may be more useful.

For classifications, even binary chocies, the situation is yet more complex. Binary choices have two main kinds of errors false positives when we assign something to a class (say a disease diagnosis), but it is actually not in the class and false negatives when we fail to recognise a true diagnosis. If the probabilty of a false positive is low we have high precision, and if the probability of a false negative is low we have high recall – which measure we want depends on the relative costs of the different kinds of error. These are sometimes combined into a single measure, most commonly the F score. If we have evidence (say a confidence measure from a machine learning algorithm) and use a threshold to deterimine our decisions, then increasing the threshold means we may have more false negatives whereas reducing it means we have more false positives. The ROC curve visualises this trade-off.

Used in Chap. 9: pages 120, 129

Also known as accuracy metrics

ROC curve – trade-off between false positive and false negative rates